187 research outputs found

    Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing

    Get PDF
    Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM

    Combinatorial problems in solving linear systems

    Get PDF
    42 pages, available as LIP research report RR-2009-15Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. As the core of many of today's numerical linear algebra computations consists of the solution of sparse linear system by direct or iterative methods, we survey some combinatorial problems, ideas, and algorithms relating to these computations. On the direct methods side, we discuss issues such as matrix ordering; bipartite matching and matrix scaling for better pivoting; task assignment and scheduling for parallel multifrontal solvers. On the iterative method side, we discuss preconditioning techniques including incomplete factorization preconditioners, support graph preconditioners, and algebraic multigrid. In a separate part, we discuss the block triangular form of sparse matrices

    High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors

    Get PDF
    International audience—We investigate an efficient parallelization of a class of algorithms for the well-known Tucker decomposition of general N-dimensional sparse tensors. The targeted algorithms are iterative and use the alternating least squares method. At each iteration, for each dimension of an N-dimensional input tensor, the following operations are performed: (i) the tensor is multiplied with (N − 1) matrices (TTMc step); (ii) the product is then converted to a matrix; and (iii) a few leading left singular vectors of the resulting matrix are computed (TRSVD step) to update one of the matrices for the next TTMc step. We propose an efficient parallelization of these algorithms for the current parallel platforms with multicore nodes. We discuss a set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps. We propose a coarse and a fine-grain parallel algorithm in a distributed memory environment, investigate data dependencies, and identify efficient communication schemes. We demonstrate how the computation of singular vectors in the TRSVD step can be carried out efficiently following the TTMc step. Finally, we develop a hybrid MPI-OpenMP implementation of the overall algorithm and report scalability results on up to 4096 cores on 256 nodes of an IBM BlueGene/Q supercomputer

    Parallel sparse matrix-vector multiplies and iterative solvers

    Get PDF
    Cataloged from PDF version of article.Sparse matrix-vector multiply (SpMxV) operations are in the kernel of many scientific computing applications. Therefore, efficient parallelization of SpMxV operations is of prime importance to scientific computing community. Previous works on parallelizing SpMxV operations consider maintaining the load balance among processors and minimizing the total message volume. We show that the total message latency (start-up time) may be more important than the total message volume. We also stress that the maximum message volume and latency handled by a single processor are important communication cost metrics that should be minimized. We propose hypergraph models and hypergraph partitioning methods to minimize these four communication cost metrics in one dimensional and two dimensional partitioning of sparse matrices. Iterative methods used for solving linear systems appear to be the most common context in which SpMxV operations arise. Usually, these iterative methods apply a technique called preconditioning. Approximate inverse preconditioning—which can be applied to a large class of unsymmetric and symmetric matrices—replaces an SpMxV operation by a series of SpMxV operations. That is, a single SpMxV operation is only a piece of a larger computation in the iterative methods that use approximate inverse preconditioning. In these methods, there are interactions in the form of dependencies between the successive SpMxV operations. These interactions necessitate partitioning the matrices simultaneously in order to parallelize a full step of the subject class of iterative methods efficiently. We show that the simultaneous partitioning requirement gives rise to various matrix partitioning models depending on the iterative method used. We list the partitioning models for a number of widely used iterative methods. We propose operations to build a composite hypergraph by combining the previously proposed hypergraph models and show that partitioning the composite hypergraph models addresses the simultaneous matrix partitioning problem. We strove to demonstrate how the proposed partitioning methods—both the one that addresses multiple communication cost metrics and the other that addresses the simultaneous partitioning problem—help in practice. We implemented a library and investigated the performances of the partitioning methods. These practical investigations revealed a problem that we call message ordering problem. The problem asks how to organize the send operations to minimize the completion time of a certain class of parallel programs. We show how to solve the message ordering problem optimally under reasonable assumptions.Uçar, BoraPh.D

    Partitioning sparse rectangular matrices for parallel computing of AAtX

    Get PDF
    Ankara : Department of Computer Engineering and Information Science and The Institute of Engineering and Science of Bilkent University, 1999.Thesis (Master's) -- Bilkent University, 1999.Includes bibliographical references.Uçar, BoraM.S

    Two approximation algorithms for bipartite matching on multicore architectures

    Get PDF
    International audienceWe propose two heuristics for the bipartite matching problem that are amenable to shared-memory parallelization. The first heuristic is very intriguing from a parallelization perspective. It has no significant algorithmic synchronization overhead and no conflict resolution is needed across threads. We show that this heuristic has an approximation ratio of around 0.632 under some common conditions. The second heuristic is designed to obtain a larger matching by employing the well-known Karp-Sipser heuristic on a judiciously chosen subgraph of the original graph. We show that the Karp-Sipser heuristic always finds a maximum cardinality matching in the chosen subgraph. Although the Karp-Sipser heuristic is hard to parallelize for general graphs, we exploit the structure of the selected subgraphs to propose a specialized implementation which demonstrates very good scalability. We prove that this second heuristic has an approximation guarantee of around 0.866 under the same conditions as in the first algorithm. We discuss parallel implementations of the proposed heuristics on a multicore architecture. Experimental results, for demonstrating speed-ups and verifying the theoretical results in practice, are provided

    Comments on the hierarchically structured bin packing problem

    Get PDF
    International audienceWe study the hierarchically structured bin packing problem. In this problem, the items to be packed into bins are at the leaves of a tree. The objective of the packing is to minimize the total number of bins into which the descendants of an internal node are packed, summed over all internal nodes. We investigate an existing algorithm and make a correction to the analysis of its approximation ratio. Further results regarding the structure of an optimal solution and a strengthened inapproximability result are given

    On partitioning problems with complex objectives

    Get PDF
    Hypergraph and graph partitioning tools are used to partition work for efficient parallelization of many sparse matrix computations. Most of the time, the objective function that is reduced by these tools relates to reducing the communication requirements, and the balancing constraints satisfied by these tools relate to balancing the work or memory requirements. Sometimes, the objective sought for having balance is a complex function of the partition. We describe some important class of parallel sparse matrix computations that have such balance objectives. For these cases, the current state of the art partitioning tools fall short of being adequate. To the best of our knowledge, there is only a single algorithmic framework in the literature to address such balance objectives. We propose another algorithmic framework to tackle complex objectives and experimentally investigate the proposed framework.Les outils de partitionnement de graphes et d'hypergraphes interviennent pour parallĂ©liser efficacement de nombreux algorithmes liĂ©s aux matrices creuses. La plupart du temps, la fonction objectif minimisĂ©e par ces outils est liĂ©e au besoin de rĂ©duire les coĂ»ts de communication, tandis que les contraintes d'Ă©quilibre Ă  satisfaire sont elles liĂ©es Ă  l'Ă©quilibrage de la charge ou de la consommation mĂ©moire. Parfois, l'objectif d'Ă©quilibre est une fonction complexe du partitionnement. Nous dĂ©crivons plusieurs applications majeures de calcul parallĂšle sur des matrices creuses oĂč de telles contraintes d'Ă©quilibre apparaissent. Pour ces exemples, mĂȘme les outils de partitionnement les plus pointus sont loin d'ĂȘtre adĂ©quats. Pour autant que nous sachions, il n'existe dans la littĂ©rature qu'un seul cadre algorithmique qui traite ces problĂšmes. Nous proposons ici une nouvelle approche algorithmique et fournissons des rĂ©sultats d'expĂ©riences la mettant en Ɠuvre

    On optimal tree traversals for sparse matrix factorization

    Get PDF
    12 pagesWe study the complexity of traversing tree-shaped workflows whose tasks require large I/O files. Such workflows typically arise in the multifrontal method of sparse matrix factorization. We target a classical two-level memory system, where the main memory is faster but smaller than the secondary memory. A task in the workflow can be processed if all its predecessors have been processed, and if its input and output files fit in the currently available main memory. The amount of available memory at a given time depends upon the ordering in which the tasks are executed. What is the minimum amount of main memory, over all postorder schemes, or over all possible traversals, that is needed for an in-core execution? We establish several complexity results that answer these questions. We propose a new, polynomial time, exact algorithm which runs faster than a reference algorithm. Next, we address the setting where the required memory renders a pure in-core solution unfeasible. In this setting, we ask the following question: what is the minimum amount of I/O that must be performed between the main memory and the secondary memory? We show that this latter problem is NP-hard, and propose efficient heuristics. All algorithms and heuristics are thoroughly evaluated on assembly trees arising in the context of sparse matrix factorizations

    Investigations on push-relabel based algorithms for the maximum transversal problem

    Get PDF
    We investigate the push-relabel algorithm for solving the problem of finding a maximum cardinality matching in a bipartite graph in the context of the maximum transversal problem. We describe in detail an optimized yet easy-to-implement version of the algorithm and fine-tune its parameters. We also introduce new performance-enhancing techniques. On a wide range of real-world instances, we compare the push-relabel algorithm with state-of-the-art augmenting path-based algorithms and the recently proposed pseudoflow approach. We conclude that a carefully tuned push-relabel algorithm is competitive with all known augmenting path-based algorithms, and superior to the pseudoflow-based ones.Nous Ă©tudions le problĂšme de couplage maximum dans des graphes bipartis. Nous dĂ©crivons en dĂ©tail une version optimisĂ©e de l'algorithme en ajustant ses paramĂštres. L'algorithme est facile Ă  mettre en Ɠuvre. Nous introduisons Ă©galement de nouvelles techniques pour amĂ©liorer la performance de l'algorithme. Sur un large Ă©ventail de cas du monde rĂ©el, nous comparons l'algorithme Push-Relabel avec des algorithmes basĂ©s sur les concepts de chemins augmentants et de pseudoflot rĂ©cemment proposĂ©s. Nous concluons qu'un algorithme de type Push-Relabel soigneusement rĂ©glĂ© est en concurrence avec tous les algorithmes connus de type chemins augmentants, et supĂ©rieur Ă  ceux de type pseudoflot
    • 

    corecore